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Gravitational wave detectors will need optimal signal-processing algorithms to extract weak 
signals from the detector noise. Most algorithms designed to date are based on the unrealistic 
assumption that the detector noise may be modeled as a stationary Gaussian process. However 
most experiments exhibit a non-Gaussian "tail" in the probability distribution. This "excess" of 
large signals can be a troublesome source of false alarms. This article derives an optimal (in the 
Neyman-Pearson sense, for weak signals) signal processing strategy when the detector noise is non- 
Gaussian and exhibits tail terms. This strategy is robust, meaning that it is close to optimal for 
Gaussian noise but far less sensitive than conventional methods to the excess large events that form 
the tail of the distribution. The method is analyzed for two different signal analysis problems: (i) 
a known waveform (e.g., a binary inspiral chirp) and (ii) a stochastic background, which requires a 
multi-detector signal processing algorithm. The methods should be easy to implement: they amount 
to truncation or clipping of sample values which lie in the outlier part of the probability distribution. 



I. INTRODUCTION 

The construction of several new detectors of gravi- 
tational radiation is currently approaching completion. 
These instruments are of a different design and have 
significantly better sensitivity and broader bandwidth 
than previous detectors. They include the LIGO de- 
tector being built in the United States by a joint Cal- 
tech/MIT collaboration 0, ||, the VIRGO detector be- 
ing built near Pisa by an Italian/French collaboration 
the GEO-600 detector being built in Hannover by 
an Anglo/German collaboration Q, and the TAMA-300 
detector near Tokyo There are also several reso- 

nant bar detectors currently in operation |^ , and several 
more refined bar and interferometric detectors presently 
in the planning and proposal stages These instru- 
ments search for very weak signals. For the most likely 
sources, the signals will be buried in the noise of the 
detectors, and need to be extracted with sophisticated 
optimal signal-processing strategies . 

The standard assumption made in the literature is that 
the detector noise has multivariate Gaussian statistics. 
This assumption is certainly incorrect: every sensitive 
gravitational wave detector operated to date has been 
characterized by noise that is both non-stationary and 
non-Gaussian. Some experimentation has shown that 
this is a serious matter [^: existing detection strategies 
for both deterministic and stochastic signals do not per- 
form nearly as well when non-Gaussian noise is present. 
Roughly speaking, if the non-Gaussian fluctuations are 
large, they bias the statistics and make it more difflcult 
to achieve a given level of statistical confidence. 

In this paper, we develop a new set of statistical signal- 



processing techniques to search for deterministic and 
stochastic gravitational waves. These techniques are ro- 
bust, meaning that they will work well even if the detector 
noise is not Gaussian but falls into a broader statistical 
class that we expect includes realistic detectors. In large 
part, these new methods are similar to the older ones: 
one constructs matched filters to search for known wave- 
forms or cross-correlates the instrument outputs at the 
different detector sites to search for a stochastic back- 
ground. The essential difference is that by using locally 
optimal methods these statistical measures are mod- 
ified. The effect is to truncate the statistics: detector 
samples that fall outside the central Gaussian-like part 
of the sample distribution (i.e., the outliers) are excluded 
from (or saturated when constructing) the measurement 
statistic. For both robust statistic is found which 

performs better than the optimal linear filter in the case 
where the detector noise is non-Gaussian, and almost as 
well in the Gaussian-noise case. 

The paper is organized as follows. In Sec. II we de- 
rive the locally most powerful signal-processing tests for 
deterministic signals. We begin in Sec. II A with a deriva- 
tion of the Neyman-Pearson criteria for optimality, in the 
case where a known waveform is hidden in white noise. 
We define the power function of a test and derive a crite- 
ria for the locally optimal test in the weak-signal regime. 
The locally optimal test is analyzed for a number of dif- 
ferent types of non-Gaussian noise, and we show that 
the locally optimal decision statistic is a matched filter 
where the non-Ga ussia n sample values are truncated or 
excluded. In Sec. II B the results are generalized to the 
case where a known waveform is hidden in colored noise, 
and we introduce models for non-Gaussian colored noise. 
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In Sec. Ill, we tu rn to the detection of a stochastic back- 
ground. Section [II A considers the case of a stochastic 



signal (i.e., where the waveform is not known) and derives 
the locally optimal statistic which can be used to corre- 
late two identical detectors, where we assume that each 
detector has independent whit e noise and is co-aligned 
and coincident. In Sec. |III B|, these results are gener- 

noise is colored, and the 



coincident. In Sec. 
alized to the case where 



the 



detectors are in different locations, and not aligned in 
parallel. In Sec. IV, we discuss an implementation of 



these statistics, and we illustrate how one can compare 
the performance of different statistics using Monte Carlo 
simulations. Section ^ contains a short conclusion and 
summary. 



II. DETERMINISTIC SIGNALS 

A. Single detector, white noise 

In order to describe the idea in a simple way we first 
discuss the case where we are searching for a known signal 
in the data stream of a single detector, where the time- 
domain detector noise samples are independent in the 
time-domain. 

Denote the data stream of the first detector by xi — 
xij for J = 0, . . . , iV — 1. In this section, since we are go- 
ing to only consider this single detector, we will drop the 
subscript "1." Imagine that we are looking for a signal 
of known waveform but unknown amplitude e, which we 
will denote by esj. Our primary interest is in the case 
where the amplitude e is either small, or zero. For conve- 
nience, imagine for the moment that this parameter can 
have only two possible values, either e = or e = e 7^ 0. 

The detection problem that we need to solve is to par- 
tition the space of possible observations i?^ into two dis- 
joint subsets. When the observation x falls into one of 
these, we conclude that e = and that the null hypoth- 
esis is true. When the observation falls into the other 
set we conclude that the signal has been observed with 
e ^ 0. To describe the partition of into two regions, 
define a function S{x E R^) which is zero in the null 
hypothesis region and unity elsewhere. This function is 
called a test. Our goal is to find the "best" choice of a 
test S. 

To help characterize tests S, it is helpful to define the 
power function of a test: 



F{5\e) = f (5(x)p(x|e)d^x. 



(2.1) 



Here p(x|e) is the probability distribution of the mea- 
surement X given signal amplitude e. For example, for 
additive white, stationary Gaussian noise of unit vari- 
ance and vanishing mean, 



p(xi6) = n (2^)"' 



(2.2) 



The quality of the test can be expressed in terms of the 
power function. 

We characterize the quality of the test by the false 
alarm and the false dismissal probabilities. The false 
alarm probability is the probability with which we con- 
clude that e 7^ when in fact e vanishes. This is given 
by F{5\0). The false dismissal probability is the proba- 
bility with which we conclude that e = when in fact it 
is e = e 7^ 0. This is given by 1 — F{S\e). 

One standard definition of the "best" test 5 is that it 
minimizes the false dismissal probability for a given false 
alarm probability. This is called the Neyman-Pearson 
test. One can find this test using calculus of variations, 
with a Lagrange multiplier A to enforce the constraint 
that the false alarm probability is fixed. The best test is 
obtained by partitioning i?^ as follows. Choose a con- 
stant Aq > 0. Then, set S = 1 in regions where the 
likelihood ratio 



A = 



P(x|e) 
p(x|0) 



(2.3) 



is greater than Aq. Set i5 = elsewhere. (We assume 
that the boundary between these two regions is a set of 
probability measure zero.) The value of the constant Aq 
determines the false alarm probability. Thus, the likeli- 
hood ratio is a "decision statistic" : a number that can 
be calculated from the observed data. If the statistic is 
less than some value, we conclude that the null hypoth- 
esis holds. If the statistic is greater than this value, we 
conclude the opposite. The decision statistic provides a 
partition of the space of observations into two disjoint 

regions. 

In the case where the noise is Gaussian ( ^.2[ ) this cri- 
teria is easily understood. The optimal Neyman-Pearson 
test divides the space of observation along an (TV — 1)- 
dimensional plane. On one side of this plane ^ = 1 and on 
the other side S vanishes. The plane is defined by setting 
the likelihood ratio (2^) to a constant. For the Gaussian 
probability distribution (2.2) the plane is defined by 



N-l 



constant = J] e-(x.-e..)V2+.?/2 

i=0 
N-l 

E 



constant 



6 y XiSi. 

i=0 



(2.4) 



1=0 



This plane is perpendicular to the vector s. Different 
choices of this plane correspond to different false alarm 
rates. 

In the case where the noise is not Gaussian, the prob- 
lem becomes more challenging. In the Gaussian test, the 
decision statistic is independent of the signal amplitude 
e. However when the noise in not Gaussian, the choice of 
decision statistic depends upon e. Consider the graph in 
Fig. |l| showing the power function F{6\e) as a function of 
e for several different tests. All the tests have the same 
false alarm rate, but the optimal test depends upon the 
value of e. 
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FIG. 1: The power function F{&\e) is shown for three differ- 
ent tests. AU have the same false alarm probability F(5|0). 
Test 53 has the best performance for large e. Test ^2 is not the 
best test for any value of e. Test 5i is the best test for small 
e. The locally optimal test 5i is the one for which dF{S\e)/de 
is largest at e = 0. If the first derivative of the power function 
with respect to e vanishes for all tests, then the locally opti- 
mal test is the one with the largest second derivative (and so 
on, if additional derivatives vanish). 



For the case of weak signals in non-Gaussian noise, 
there is a useful test called the "locally optimal" test. 
For a given noise probability distribution, the locally op- 
timal test is easy to describe, and leads to a simple deci- 
sion statistic which can be calculated from the observed 
data lH^. To define this test, it is useful to again con- 
sider the set of all tests with a given false alarm rate, as 
shown for example in Fig. |^. The locally optimal test is 
the one that maximizes dF(6\e)/de at e = for a fixed 
false alarm probability. As above, one can show that the 
locally optimal test sets S — 1 inside the region where 

A(i) = [dlnp{x\e)/d€]e=o > constant (2.5) 

for some constant, and i5 = elsewhere (see Fig. |l|). The 
value of the constant determines (or is determined by) 
the false alarm probability. More generally, if the first 
derivative vanishes, the locally optimal test is determined 
by the first non-zero 



A 



(n) 



1 dXx|e) 



p(x|0) de" 



(2.6) 



e=0 



To understand the implications of this, it is helpful to 
consider several examples. 

The examples here are for the case where the (addi- 
tive) detector noise is independent for each sample value 
(so the noise spectrum is white) but has an arbitrary 
probability distribution. For convenience, we write 



p(xk) = n 



(2.7) 



where the function / is a quadratic function of its ar- 
gument for the case where the probability distribution 
of the noise is Gaussian. [Note: any probability distribu- 
tion for stationary additive noise where the sample values 
are independent can be written in this way. If the noise 
is not stationary but is still additive and independent, 
then each function / appearing in (2.7) may be different 
f{xi — esj ) — > fi{xi — esi).] The first derivative of the 
PDF (pj|) with respect to e is 



dlnp(x|e) 



E 

1=0 



Sifixi - esi) 



(2.8) 



where /' denotes the derivative of / with respect to its 
argument. Setting e = in this exp ression one can easily 
find the locally optimal test ( |2.5| ). This is defined by 
setting (5 = 1 in the region 



A 



(1) 



N-l 

E 

1=0 



Sif'{xi) > constant 



(2.9) 



and setting 5 — elsewhere. [Note: if e can take either 
sign ±e then an absolute value sign should enclose the 
LHS of the inequality in Eq. (2.9).] As before, the value 
of the constant determines the false alarm probability. 
Here are several examples: 

• Gaussian Noise: f{x) = x^ jl -|- ln(27r)/2, so 
/'(x) = X. For this case the locally optimal test 



(2.9) and the optimal test ( p.4| ) both give the same 
statistic: ^Y^=q ^i^i- This is the standard optimal 
linear filter. 



• Exponential Noise: f{x) = a\x\ — ln(a/2), so 
f'{x) — asgn{x). Here the locally optimal statistic 
is given by (^) as 



E 

i=0 



SiSgnixi) 



(2.10) 



where the sgn(a;) function is +1 for a; > and — 1 
for a; < 0. 

• Sum of distinct Gaussian processes: this is a 
white-noise version of the model given in [O] : 



(l-P)(2^)-i/V-ie-^ 
-fP(2^)-i/2a-ie-^'/^*', 



(2.11) 



where < a < a and P e (0,1). Usually one 
also has P ^ 1. This noise model is discussed 
in more detail later in this paper. It often arises 
when the most common source of noise is Gaussian, 
but there is also a "tail" of "outlier" events which 
dominates the wings of the distributi on. Here the 
locally optimal statistic is defined by (2.9) where 



(1 - P) + P(g/g)3e^'('^''-'^'')/^ 
il-P) + P(a/a)e^'('^-'-*-')/2 
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FIG. 2: The functions f'{x) and f'{x)/x are sho wn fo r the 
sum of distinct Gaussian processes, defined by Eq. ( 2.11 ) with 
parameters cr = 1, (t = 4, and P = 1%. For small \x\ one 
has f'{x) « X. Outside the central Gaussian region (which 
dominates the probability density), i.e., for large \x\, f'{x) 
falls off. This effectively "clips" the correlation statistic for 
outlier data samples. 



This function is shown (for the case cr = 1, ct = 4, 
P = 1%) in Fig. ||. Roughly speaking, for \x\ small 
compared to a one has f'{x) ~ xja^ . For large |a;| 
one has /'(x) ~ xju"^ . 

• Gaussian noise plus uniform background: 

Here, we have a (small) uniform background su- 
perposed on Gaussian noise of zero mean and unit 
variance. This is defined for (small) P > by 



-Six 



(1 -P)(27r)-i/2e-:^V2 \x\ 
0, \x\ 



< L 
> L 
(2.12) 

Here we assume that L 3> 1 is the scale size of the 
uniform background (the probability distribution is 
correctly normalized only in the limit L — > oo). In 



this case one finds that f'{x) 
f{x) = for 1 < |a;| < L. 



for 



< 1 and 



While the results for the different probability distribu- 
tions are technically different, they all carry same mes- 
sage, which is the central result of this paper: // the 
distribution of sample values has a central Gaussian re- 
gion, then sample values falling in this region should be 
correlated exactly as they would be in the Gaussian case. 
If a sample value falls outside this region, its value should 
be truncated (or clipped) to the largest allowed value in 
the central region, or even dropped from any correlation 
statistic, depending upon the shape of the probability dis- 
tribution. 

Let us repeat this central point one more time. The 
results show that when the noise is not Gaussian, the 



normal optimal filter used to construct a decision statis- 
tic is replaced by a somewhat different sum. The values 
of the expected signal Si are multiplied, not by the ob- 
served data Xi but by some non-linear function of that 
data, then summed. In the event that the probability 
distribution of the noise has a non-Gaussian tail, the ef- 
fect of this non-linear function is to "clip" or truncate 
sample values which fall outside the central bulge of the 
probability distribution function. 



B. Single detector, colored noise 

If the detector's noise spectrum is colored rather than 
white, then the previous analysis does not apply: the 
assumption that the different sample values are uncor- 
related no longer holds. However the analysis can be 
generalized to the colored case if we make assumptions 
that are motivated by the properties of stationary detec- 
tor noise. 

In explaining this, it helps to begin by describing the 
stationary Gaussian case. For a colored Gaussian pro- 
cess, the Probability Density Function (PDF) of the de- 
tector samples may be expressed as 

p(x) (27r)-^/2(jg^ R)-i/2 exp(-ix^ • Q • x) (2.13) 

where the N x N correlation matrix R = (x (g) x^) is a 
positive-definite real symmetric matrix with N{N -\- 1)/2 
real degrees of freedom and Q = R~^. We have assumed 
that the process has zero mean. The volume element 
associated with this PDF is dx = Hjllo^ ^^j ■ the 
time-domain, x is a vector of real numbers so x''^ = x""". 

In the case where the random process is stationary, 
the matrix R is a Toeplitz matrix, which depends only 
upon |i — j|. Such a process is defined by the first row or 
first column of the matrix and has only N real degrees of 
freedom. Thus stationary Gaussian processes are a tiny 
subset of all Gaussian processes. 

Now consider the PDF of new random variables that 
are linear combinations of the old ones: x = U • x. Take 
U to be an arbitrary unitary matrix. Clearly the PDF 
of these new variables x is still Gaussian. The matrix 
U can be chosen to diagonalize the correlation matrix: 
this is called a Karhunen-Loeve transformation. In the 
limit where the time interval occupied by the N samples 
is much larger than the correlation time of the noise, the 
linear combinations of random variables that diagonal- 
ize the correlation matrix asymptotically approach the 
Discrete Fourier Transform (DFT). This is given by 



(2.14) 



Thus, if N is sufficiently large, to a good approximation 
the PDF of the new variables in the Gaussian case may 
be written as 



p(x) 



l{N-l)/2] 

n 



P^7'exp(-2|ifc|VPfc) (2.15) 



fc=i 
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where Pk is the (real, positive) mean spectral amplitude 
in the kth frequency bin: 



(2.16) 



for 1 < k,k' < [{N - l)/2]; thus R = U • R • U^^ ~ 
5diag[Pfe]. In other words, it is a good approximation to 
express the PDF of a stationary colored Gaussian process 
as a diagonal process in frequency space. 

The limits of the product in Eq. (2.15) appear strange 
because x can not take on arbitrary values since x is real. 
The consequences include: 

u Xk = Hence the amplitudes of Xk for k = 

[N/2] + 1, . . . , A'' — 1 are completely determined by 
ifc for fc = l,...,[(iV-l)/2]. 

• xq and, for even N, Xi^/2 ^'^^ real. However, we as- 
sume that the data set has had the mean value (DC 



term) removed: ^ 



N-l 



0. Since gravitational 



wave detectors are AC-coupled and have no use- 
ful low-frequency response, this is a valid assump- 
tion. It implies that xq is identically zero. Second, 
when N is even, we assume that there is no energy 
in the Nyquist frequency bin: Xi^/2 ^Iso vanishes 
identically. This is a very reasonable assumption, 
since an experiment will include an anti-aliasing fil- 
ter whose response (as a function of frequency) falls 
off rapidly as the Nyquist frequency is approached. 

• The volume element associated with this PDF is 
therefore Wl^^^^'"^^ d(5Rifc)d(3ifc). 

The likelihood ratio in the case of colored, stationary, 
Gaussian noise is 



In A = lnp(x — es) — Inp(x) 

= est • Q • X - ie^st • Q • s 

or, in the frequency domain, 

[(N-l)/2] 

In A = (constant) -|- e43? ^ s%Xk/Pk- 



(2.17) 



(2.18) 



Thus, the matched filter statistic, with a weighting equal 
to the inverse of the noise spectrum, is the optimal de- 
tection statistic. 

This motivates a more general model for the statistical 
distribution of colored non-Gaussian detector noise, as- 
suming that it is still stationary. In this case, to good ap- 
proximation, the two-point correlation matrix (xkX^,) is 
diagonal. There may be higher-order correlations present 
between the Fourier amplitudes at different frequencies, 
but we will assume that this additional correlation is neg- 
ligible, and that to a reasonable approximation the prob- 
ability distribution of the noise in the non-Gaussian case 



is described by a PDF in which the different frequency 
components are independent: 



[(JV-l)/2] 

p(i)= H 27T-'p-^exp[ 

A;=l 



'2gk{\x,\yPk)], (2.19) 



with volume element Ofc^^i d{^Xk)d{'^Xk)- The 
functions gkiu) depend upon the frequency bin index k, 
so that the statistical distribution can depend upon the 
frequency. For the colored Gaussian case the functions 
are gkiu) — u. In order that the PDF be properly nor- 
malized, and that {xkX^,) = ^^kk'Pk, the functions gk{u) 
must obey 



ue-9'=(")dw = 1. 



(2.20) 



Respectively, these constrain the additive constant in the 
definition of gk, and the multiplicative scale of the argu- 
ment of gk- This is not the most general possible form of 
the probability distribution of a stationary random pro- 
cess, but in many situations it should be a reasonable 
approximation, particularly if the quantities of interest 
are dominated by the second moments. 

The locally optimal statistic may now be easily de- 
rived. Letting Sk denote the DFT of the expected wave- 
form, and as before zeroing its DC and Nyquist com- 
ponents, the conditional probability distribution of the 
detector output is given by 



[(JV-l)/2] 

pm = n 



2n-'P^' 



e-xp[-2gki\xk - eskf/Pk)] 



fc=i 



The locally optimal test can then be obtained from the 
first derivative: 



l{N-l)/2] 



A 



(1) 



k=l 



m~s*kik/Pk)9'k{\ik\VPk). (2.21) 



In the colored Gaussian case .g^(u) = 1 this is the or- 
dinary optimal linear matched filter. The contributions 
of the different frequency bins are weighted by the in- 
verse noise power spectrum in that bin. In the non- 
Gaussian case, just as for the case of uncolored white 
noise, the correlation in frequency space is clipped or 
truncated for (frequency-bin) samples that lie outside 
the central Gaussian part of the probability distribution, 
where |5'(u)| <C 1. An example of this may be seen in 
Fig. ||: for the illustrated case g'{x'^/2) = f'{x)/x. 

Let us consider another form of non-Gaussian noise 
that describes a process in which there is an ambient 
Gaussian noise background interrupted occasionally by a 
large noise burst, which we will model as a second com- 
ponent of Gaussian noise with a much larger variance. 



The probability distribution we adopt is |11 



p(x) = (l-P)(27r)"*(detR)-^exp(-ixt -Q-x) 
-FP(27r)"^(det R)-5 exp(-ixt • Q • x)(2.22) 



6 



where R is the autocorrelation matrix for the normal 
ambient detector noise and R is the composite autocor- 
relation matrix for the detector noise when a noise burst 
is present. The noise bursts occur with probability P 
in this model Also, Q = R^^ and Q R"^ We 
assume that Q is much smaller than Q, meaning that 
x''' • Q • X 3> xt • Q • X for all vectors x. The locally 
optimal statistic is 



A(i) - 
where 



dlnp(x — es) 



de 



jR(st ■ Q ■ x) 
1 + a 



detR 



1 - P V det R 



exp[-xT • (Q — Q) • x] 



(2.23) 



(2.24) 



is a detector of possible bursts. When a burst is absent, a 
is typically small and the locally optimal statistic reduces 
to the matched filter. However, when a burst is present, 
a is typically large and the matched filter is suppressed. 
Thus the locally optimal statistic is nearly equivalent to 
the matched filter statistic with a veto if a segment of 
data has a large amount of excess power as measured by 



or, in the frequency domain. 



Q 



[(JV-l)/2] 
k=l 



(2.25) 



(2.26) 



The lengths (in time) of the data chunks used to esti- 
mate the autocorrelation matrices should be choosen to 
be significantly longer than the characteristic time of the 
signals being searched for, but still short enough that the 
detector behavior is quasi-stationary. For inspiral signals, 
typical signals are in the detector band for tens of sec- 
onds, so the matrix estimation time should be at least of 
order tens of minutes. For stochastic background detec- 
tion, the correlation time between the two instruments is 
tens of milliseconds, so that the matrix estimation time 
should be at least a few seconds. 

Based on the two forms of non-Gaussian noise con- 
sidered in this section, it seems reasonable to adopt the 
following detection rules: (i) veto immediately any seg- 
ment of data that has an excess of power as measured by 
the excess power statistic; (ii) for segments of data with- 
out an excess of power, construct the matched filter in 
the frequency domain, but exclude those frequency bins 
in which the detector power is too large. The resulting 
(truncated) matched filter is a good approximation of the 
locally optimal statistic for a wide variety of possible non- 
Gaussian noise distribution. In this sense, it is a robust, 
nearly optimal detection statistic. 



III. STOCHASTIC SIGNALS 

Observational limits from nucleosynthesis demonstrate 
that the stochastic background of gravitational radiation 



has such small amplitude that it would not be detectable 
with a single instrument |p^ . In a single instrument, 
there would be no practical way to discriminate between 
intrinsic detector noise and the small additional noise- like 
output arising from a stochastic background. However, 
one can correlate the outputs of two different instruments 
and search for a common signal in this way. If the instru- 
mental noise is not Gaussian, then the previous single- 
detector analysis can be easily generalized. 



A. Two coincident co-aligned detectors, white noise 

We begin by considering the simple case in which the 
two detectors are coincident and co-aligned, so that they 
have identical output contributions from the stochastic 
background but independent intrinsic noise. We also as- 
sume that the intrinsic noise samples in each detector are 
independent, and hence white. 

If the signal were deterministic (known) then the joint 
probability distribution for the samples in the two detec- 
tors could be written as 



p(xi,X2|e) = Y[ 



-fi{xi 



(3.1) 



4=0 



This system can be analyzed in exactly the same way as 
in Sec. II A. However the stochastic background does not 
produce a known (deterministic) signal, so that the prob- 
ability distribution needs to be averaged over its expected 
distribution Pshiso, ■ ■ ■ , sw-i) (which, by reason of the 
central limit theorem, is almost certainly a multivariate 
Gaussian). This leads to a joint probability distribution 
which is given by 

p(xi,X2|e) = J dso - ■ ■ J dsN-iPsh{so, ■ ■ ■ ,SN-i) 

N-l 

X Yl e-f^'^''^-'-"''^-^^'^''^-'-"''^ (3.2) 

i=0 

= / dsp,b(s) n e-^^(^i--"')-/=(^' 



es,) 



j=0 



Here e may be thought of as the coupling of the detector. 
The case of small e corresponds to a detector that is only 
weakly coupled to the signal. For this non-deterministic 
signal, it is still straightforward to construct a locally 
optimal test, and a corresponding decision statistic or 
threshold criterion. 

The locally optimal statistic is obtained from the 
derivative of the probability distribution with respect to 
e. This is given by 



dp(xi,X2|£) 

de 



= y dspsb(s) 
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■i=0 



fi{xi 



-tSi)- f2{x2,i-iSi) 



(3.3) 



Setting e = and dividing by p(xi,X2|0) yields the lo- 
cally optimal statistic: 



(3.4) 



Unfortunately this vanishes if the random process de- 
scribed by Psb(s) has vanishing mean, since in this case 
/ SjPsb(s)ds — 0. This is indeed the case for the 
gravitational-wave stochastic background. 

When the first derivative vanishes, the locally optimal 
statistic is defined by having the largest second deriva- 
tive at e = 0. Se e Fig. |^ for example. Taking another 
derivative of (3.3) and setting e = yields 



A(2) / dspsb(s) <^ (E1o^Sj[/i(2^1j) + /2(2;2,j)] 



-Ef=o'^'[/r(^l..) + /2K,)] 

(3.5) 

The terms that appear in this statistic have different 
character, and before moving on, some discussion is re- 
quired. 

The locally optimal statistic depends upon the sta- 
tistical character of the stochastic background radiation 
through the second-order moments. We will assume that 
the stochastic background is a stationary process, so that 
the second order correlation {siSj) is a function of the lag 
\i- j\ only: 



C(|z-j|) = (,s,.s,) 



dspsh{s)s.,Sj 



(3.6) 



In a stochastic background search, the "signal model" 
only requires an assumption about the form of the spec- 
trum. This is (roughly) the Fourier transform of C(A). 
Without loss of generality we normalize C(A) so that 
C(0) = 1 (this simply scales the value of e). Expressing 
the locally optimal statistic in terms of the correlation 
function C then gives 



A(2) = -E[/"(^l.^) + /2(^2,,)] 



i=0 
Af-1 



- E C{\]-k\)[f[{x,^,)f[{x,^k) (3.7) 
+f2Mr2{x2,k) + 2/{(a;i,,)/^(x2,fe)]. 



Each of the five terms that appears in (3/7) has a spe- 
cific interpretation. The first four terms that appear in 
the locally optimal estimator A(2) are generalized "single- 
detector" statistics which do not cross-correlate the two 
detectors. They are generalized measures of the "energy" 
received by each individual detector, and provide useful 



information only if the stochastic background contributes 
substantially more to the measured signal than the de- 
tector output does, or if the detector's intrinsic noise con- 
tributions can somehow be separated from the noise con- 
tribution arising from the stochastic background. (This 
will not be the case for the first few generation s of grav- 
itational wave detectors). The last term in ( |3.7D is a 
Generalized Cross-Correlation (GCC) statistic, that pro- 
vides useful information even if the detector noise domi- 
nates the signal: the expected case for gravitational-wave 
stochastic background. To quote from Kassam (following 
Eq. (7-24) in Ref. Q) 

It is important to note that the increase in 
power level occurs whenever random signals 
are present at the individual receivers of the 
array regardless of whether the signals across 
the array are one common signal or are com- 
pletely uncorrelated. The GCC part of the 
Locally Optimal (LO) statistic responds only 
to a common signal or at least to signals 
which are spatially correlated across the ar- 
ray elements. This is a major reason why it 
is useful to employ only the GCC part of the 
LO statistic in applications involving detec- 
tion as well as location of signal sources. 

For this reason (and others jl^) we drop the single- 
detector terms from the statistic, and define the GCC 
statistic as: 



A 



GCC 



N-1 



k\)f'i{xi,o)f2{x2,k). (3.8) 



This generalized cross-correlation statistic reduces to the 
ordinary cross-correlation statistic in the case where the 



detector noise is Gaussian: Ji{x) ~ f2{x) 
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log(27r)/2. It can be easily generalized to the case of 
three or more detectors pO{ . 

In practical work C will vanish for lags greater than the 
light travel time between the two detectors (i.e., 10 ms for 
the LIGO detectors) . This means that even if N is chosen 
to be very large, Aqcc only correlates samples from the 
two detectors taken within this time window. (Note: if 
the detector noise is colored, then the time window may 
be larger, as will be seen shortly.) 



B. 



Two non-coincident non-co-aligned detectors, 
colored noise 



In this section, we generalize the work of Sec. Ill A to 



the case where the two detectors are not coincident or 
co-aligned, and their noise power spectrum is not white. 
We assume that the intrinsic detector noise of the two 
detectors is independent. If the two detectors are widely 
separated and subject to different environmental influ- 
ences, this assumption should hold. 
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Let us start by assuming that the two detectors eaeh 
have internal (instrumental) colored Gaussian noise with 
known autocorrelation matrices Ri„^i and Rin,2, and that 
the instrumental noise of the two detectors is indepen- 
dent. The stochastic background produces an additional 
source of colored Gaussian noise that is correlated be- 
tween the two detectors. The stochastic background 
noise is measured by the autocorrelation matrices Sn = 
(si0s|), S22 = (82(8)82), and the cross-correlation matri- 
ces S12 = (si ® S2), S21 = (82 (8) s\). Since the stochastic 
background is isotropic, Sn = S22 = R-sb (the stochastic 
background contribution to the detector's autocorrela- 
tion matrices) and S12 = S21 = S (the cross-correlated 
noise between the detectors due to the stochastic back- 
groimd). The total autocorrelation noise of the two de- 
tectors are Ri = Rin,i + e^Rsb and R2 = Rin,2 + e^R-sb- 
In the presence of the stochastic background, the likeli- 
hood ratio is 



{2n) 



-N 



p(xi,X2|e) = 
where 

In the weak signal approximation, 



(detS)-iexp(-i^t.s-i.^) (3.9) 



Xl 
X2 



and S 



Ri 



e^S R2 



Qin.l 
Qi„,2 



+e- 



+ 



Qin,l • R-sb • Qin,l Qin,l ' S • Qin,2 
Qin,2 • S • Qin,l Qin,2 ' Rsb " Qiii,2 

Qin,l • Rsb • Qin,l ' Rsb ' Qin,l, 
0, Qin,2 • Rsb • Qin,2 ' Rsb ' Qin,2 

Qin,l • S • Qin,2 ' S • Qin,l, 
0) Qiii,2 • S • Qin.l • S • Qm,2 



+0{e% 



(3.10) 



In det S = In det Rin,i + In det Rin,2 

+e^[tr(Qin,i • Rsb) + tr(Qin,2 • Rsb)] 

~^''[2^-'^(Qin.l ' '^^^ ' Qi":! ' Rsb) 

+ 2^'^(Qin,2 • Rsb • Qin,2 ' Rsb) 
+tr(Qin,2 • S • Qin,l • S)] 

+0{e% (3.11) 



and 



In A = lnp(xi,X2|e) - lnp(xi,X2|0) 

= 2*^^(^111,1 • Rsb) — 2^^{Qin,2 ■ Rsb) 

+5?(x| • Qin,2 • S • Qi„,l • Xl) 

+ ^xt •Qi„,i-R,b-Qi„,i-xi (3.12) 

+ \A ■ Qin,2 • Rsb • Qm,2 ' X2} + 0{e*) 



where Qins 



'R-^\ and Qin,2 



R,^ 



The last two 



terms represent the autocorrelation "energy" detectors. 

The following question now becomes important: how 
does one obtain the quantities Rin,i and Rin,2- There 
are two possible methods: (i) by a theoretical under- 
standing of the detector, or (ii) by shielding the instru- 
ment from the stochastic background and measuring the 
noise autocorrelation. For gravitational wave searches, 
method (ii) is not available as there is no way to shield 
the detector from a stochastic background of gravita- 
tional waves. Method (i) holds more promise, but if the 
stochastic background is expected to be weak, it is un- 
likely that our understanding of the detector will be suf- 
ficient to distinguish between the noise autocorrelations 
Rin and Rin -I- eRsb- We expect that the noise matri- 
ces that should be used are the measured noise matrices 
Ri = (xi 5$ x{) and R2 ~ (x2 (S) xj), which contain both 
the internal, instrumental noise as well as the stochastic 
background "noise." Since it is these quantities rather 
than Rin.i and Rin, 2 that are known, the previous anal- 
ysis must be modified. We now have 



1 _ 


Qi 





Q2 






Q2 s 


Qi 


Qi s Q2 





Qi 

.0' 


SQ2 
Q2S 


S Qi, 
Qi S 




Q2 





(3.13) 



and 



In det S = In det Ri + In det R2 

-eV(Q2 • S • Qi • S) + 0(e'^), (3.14) 



In A = e2sR(x| • Q2 • S • Qi • xi) O(e^) (3.15) 



where Qi = ti-i^ and Q2 = R2^'''. The locally optimal 
detection statistic (which is appropriate for weak signals) 
is the cross-correlation statistic. 

To generalize to non-Gaussian noise, it is helpful to 
use moment generating functions. Suppose the vector ni 
represents the internal (instrumental) noise in the first 
detector. The moment generating function for ni is 



$in,i(wi) = (e'- •"^) 



(3.16) 



and the probability distribution for ni is the Fourier 
transform of the moment generating function: 



Pin,i(ni) = / dwie 



in,l- 



(3.17) 



The moment generating function $in,2(w2) for the inter- 
nal noise in detector 2 is defined similarly. We assume 
that the stochastic background is a multivariate Gaussian 
with a moment generating function 



1 



*sb(wi,W2) = exp(--e u; • S^b • w) 



(3.18) 
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with 



Wl 
W2 



and Ssb = 



Rsb S 
S Rsb 



Then the moment generating function for the detectors' 
output is 



$(wi,W2) = 



_ /„«W, -Xi iW, ■X2\ 



(3.19) 



= $in4('*^l)*m,2(w2)$sb(wi,W2) 

and the joint probabihty distribution is 

p(xi,X2|e) = J dwidw2e-'(^?-^i+^?-^^)$(wi,W2) 

= J ducxp{-i$,^ ■ W)$in,l(wi)$in,2(w2) 



p(xi,X2|0) 

+e2{(VTpi„,2)(x2)-S-(Vpin,l)(xi) 



(V^Pin4)(xi) • Rsb • (Vpin,l)(xi) 



+ 2(^ Pin.2)(x2) ■ Rsb • (Vpin,2)(x2)} 



(3.20) 



Thus, if we ignore the autocorrelation terms, the locaUy 
optimal statistic is 



A 



(2) 



(VTlnpi„,2)(x2) • S • (Vlnpi„4)(xi). (3.21) 



This equation for the locally optimal statistic is good for 
the time domain, in which the detectors' output vectors 
are real and so the derivative is meaningful. 

To extend the result to complex vectors, and thus to 
a frequency-domain representation, we use the following 
formal replacement: replace every complex number x = 
a + ib and derivative V with the matrices 



a b 
—6 a 



and V 



d/da ~d/db 
d/db d/da 



Note that this means x* is represented by xj- and \x\'^ 
by x^ ■ X. Also, the meaning of V|a;p is V_{x^ ■ x) = ■ 
The locally optimal statistic is 

A(2) = ^(VTlnpi„.2)(x2)- S •(Vlnpi„,i)(xi) 

+ i(Vlnpi„.2)(x2)- S •(VTlnpi„,i)(xi(j3.22) 

For example, for the noise model in which Inpin(x) oc 
Efcfr'^^" 9ki\ik\V2Pk) and S = diag[7fca2], the locally 
optimal statistic is 



A(2) = 3? 



[(Ar-l)/2] 2~* ~ 



fc=i 



Pl,kP2,k 

Xff'i^fc(|ii,fc|VA,fc)32.fc(|52,fc|V^2,J.3.23) 



Before we examine specific non-Gaussian noise models, 
we will describe the form of the matrices Rsb and S. A 
stochastic background, if present, contributes to the sig- 
nal amplitude at each detector. To simplify the analysis, 
in Sec. Ill A, we assumed that the detectors were coinci- 



dent and co-aligned, so that the amplitude contribution 
in each individual detector are identical. Here, we drop 
that assumption. 

Because the detectors are not co-aligned, the axes of 
the two interferometer arms point in different directions, 
and are sensitive to different linear combinations of the 
two possible gravitational wave polarizations. This re- 
duces the correlation between the amplitudes in the two 
detectors, since we will assume that the stochastic back- 
ground is unpolarized. An additional loss of correlation 
occurs because the two detectors are separated. This loss 
of correlation becomes increasingly greater for shorter 
wavelengths. Roughly speaking, there is no significant 
loss of correlation for wavelengths much longer than the 
inter-detector distance, and there is a complete loss of 
correlations for wavelengths much shorter than this|]l^. 

The loss of amplitude correlation due to the separation 
and non-alignment of the two detectors may be described 
(for an unpolarized and isotropic stochastic background) 
in terms of the overlap reduction function j{f) defined 
by Flanagan ||T^. This quantity is the average value of 
the product of the detector outputs, for a stochastic back- 
ground of a given frequency /, averaged over the possible 
directions of arrival and phases. It is given by: 



^(/):^JL / dCle^'''f^-'^^/%F+F+ + F^F^). (3.24) 

Sir Jg2 

Here 1] is a unit-length vector on the two-sphere. Ax is 
the separation between the two detector sites, and F^"^ 
is the response of detector i to the + or x polarization. 
For the ith detector (i = 1, 2) one has 



1 



(3.25) 



where e^(j'^(fl) are the gravitational wave polarization 
tensors for a wave propagating in direction Cl. The nor- 
malization of 7(/) is chosen so that for coincident and 
co-aligned detectors, 7(/) = 1. For co-aligned but not co- 
incident detectors, 7(/ = 0) = 1. For coincident but un- 
aligned detectors, 7(/) is a frequency-independent con- 
stant that depends only upon the relative orientation of 
the two detectors, and vanishes if the two detectors are 
sensitive to orthogonal polarizations. 

General expressions for l(.f) for arbitrary detectors 
may be found in Refs. |lj. For the pair of LIGO 
detectors 7(/) is shown in Fig. ^ and is given by 



7(/) «-0. 1248 jo(a:^)- 2.900 



3.008 



(3.26) 



where x = 2nfd/c is a frequency variable, d = 3010 km 
is the detector separation, c = 2.998 x 10^ km/s is the 
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200 ?5 
Frequency f (Hz) 




FIG. 3: The overlap reduction function 7(/) is shown for 
the two LIGO detectors as a function of frequency /. The 
left/right graphs have hnear/logj^Q frequency axes. Because 
the detectors are almost anti-aligned, the function is close to 
— 1 at low frequencies. The first root is at 64 Hz. 



speed of light, and j„ is a spherical Bessel function. It 
is helpful to introduce notation for the overlap reduction 
function's values in the frequency bins of interest. Let 
fk = k/{NAt) — k/T denote the frequency of the fcth 
bin, with fc = 0, . . . , [-/V/2]. Here At is the sample interval 
and T = NAt is the total observation time. Then 



7fe = 7(/fc) = lik/T). 



(3.27) 



are the values of the overlap reduction function in the 
fcth bin. 

The stochastic background is characterized by its di- 
mensionless energy density 



^gw(/) — 



1 



Pcritical In / 



(3.28) 



where dpg^ is the energy density of the gravitational ra- 
diation contained in the frequency range / to f + df, and 
^critical IS the Critical energy density required (today) to 
close the universe: 



^critical 



SttG 



1.6 X 10 hiQQ erg cm 



(3.29) 



Hq is the Hubble expansion rate (today): 

Ho = /iioo X lOOkms^i Mpc^i = 3.2 x IQ-i^/iioo s~\ 

(3.30) 

and ft-ioo is a dimensionless factor that we have included 
to account for the different values of Hq that are quoted 
in the literature. pO| 

The PDF of the stochastic background strain can usu- 
ally be expressed in closed form. The central limit the- 
orem shows that if the stochastic background has been 
produced (as it is in many scenarios) by an incoherent 
sum of many small processes, then its statistics will be 
stationaryljl^ and Gaussian. This means that it is char- 
acterized by the single-site second moments 



= crlSkk' 



with 



3-^0^gw(/fc) 

20^2At|/,|3- 



(3.31) 



(3.32) 



As before, we have assumed that N is chosen so that NAt 
is much larger than the correlation time of the stochastic 
background (filtered by th e ins trument response func- 
tion), so that the RHS of ( 3.33 ) is proportional to Skk'- 
The expectation value of the product of the strain at the 
two different sites is reduced by the overlap reduction 
function: 

(si,fcS2,fc') = ih.kstk') = 7fe(si,fcSi,fc') = IkCrlSkk'- 

(3.33) 

This follows from Eqn. (3.56) of reference jl^. In prac- 
tice, since the shape of the stochastic background spec- 
trum is not know, the dependence of the ak on k should 
be assumed to fit some simple parameterized model, such 
as a power law cr^ oc fc" for a reasonable range of a. 

We can now express the locally optimal detection 
statistic for a stochastic background in colored Gaussian 
noise. It is: 

[(JV-l)/2] 

\nA = e^?St J2 lkCTlxlj^X2,k/{Pi,kP2.k) (3.34) 
fe=i 

where Pi^k and P2.k are the measured noise spectra in 
the two detectors. 

Let us now turn to our first non-Gaussian noise model. 
Our starting point is a PDF for the noise in the two detec- 
tors in the absence of any stochastic background signal. 
We make the s ame assumptions about the detector noise 
as in Sec. HE . The PDF is given in freq uency space by 
a product of two terms identical to ( ^.19 ), 



[(W-l)/2] 

p(ii,X2) = n 27r-ipJe-29i,^(l*i,^lVPi..) 



fc=i 

[(JV-l)/2] 

X n 



-2ff2,fc'(|52.fc'lV-P2,*.') 



k' = l 
[{N-l)/2] 

n 47r-2pJP, 



2 

2,k 



k=l 



Xe-29l,fc(|£l.fclV-Pl,fc)-2g2,fc(|52,fclV-P2,fc)^ 

The statistical distribution of the stochastic back- 
ground is 

l{N-l)/2] 

ftb(§i,§2)= n (W)"'(i-7fc)"' 



fe=l 



X exp 



|si,fcP + 1^2, fep - 27fe3fJ(l* ^.S2.k) 



3.35) 



We can now find the locally optimal statistic. Since the 
detector is linear, as before, one has a joint probability 
distribution for the observed Fourier amplitudes: 

p(xi,X2|e) = J dsicis2Psb(si,i2)p(xi - eii,X2 - es2). 
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This corresponds to a stochastic background with a char- 
acteristic energy-density function eilgw(/).|0 The lo- 
cally optimal statistic is 



[(Ar-l)/2] 



A(i) = 43? 



k=l 



{slk)ihkg[A\S:i,k\yPi,k) 



Pi 



l,fc 



2,fc)^2,fcg2,fc(l^2,fc|V-P2^ 



P2,k 

where the quantities {si j.) and (sj are mean values of 
the stochastic background's Fourier amplitudes at each 
of the two detector sites. These both vanish: 

(S{l,2},fc) = J dsids2Psb(si,S2)s:[i 2}.fc = (3.37) 

since the mean values of the Fourier amplitudes are zero. 
Hence, as in Sec. [II A one must look for the locally opti- 



mal statistic at the next order in e. Taking an additional 
derivative, one can easily compute A(2)- As in Sec. [II A 



this consists of two types of terms. For the same reasons 
as before, we discard from this decision statistic all the 
single detector terms. This leaves us with the following 
generalized cross-correlation statistic: 



Agcc = 16 y dsids2Psb(si,S2) 



k,k' = l 



Pl,k 

5^(s2,fc'^2,fc' )52.fe' (l^2,fc' I V^2,fc') 



P2,k' 



^3.38) 



Since the expectation value of the product of the stochas- 
tic background at the two sites is given by 

{si,kS2,k') = y rfSidS2Psb(si,S2)si,fcS2_fc = Skk'^kC^k 

(3.39) 

one obtains the generalized cross-correlation statistic 

[(Af-l)/2] 2~* ~ 



k=l 



PukP2,k 



X5i,fc(|5i,fc|VPi,fcKfe(|i2.fc|V^'2,*C8.40) 

If the functions g' are replaced by unity, this reduces 
to the standard result for the optimal filter for the case 
where the detector noise is assumed to be stationary and 
Gaussian. For typical non-Gaussian noise models, the 
effect of the g' functions is to exclude those frequency 
bins in which \xk\'^/Pk is large in either detector. 

Our second non-Gaussian noise model is similar to the 
noise burst model used in Sec. p B| , generalized to the 
two detector case. The composite PDF for this model is 



j5(xi,X2|e) = (27r) 



-N 



x{(l - Pi)(l - P2)(det S)-iexp(^-^t . ^-i . ^) 



+Pi(l - P2)(det Si)-i exp(-i^^ • • 

-fP2(l - Pi)(det S2)-i exp(-i^^ • S^^^ • 

-fPiP2(det Si2)-i eM-^i^ ■ ^12 ■ i)} (3.41) 

where Pi and P2 are the probabilities of bursts in detec- 
tors 1 and 2. The matrices Si, S2, and S12 represent 
the correlation matrices when a noise burst is present. 
As in Sec. [IB, a burst effectively changes the noise level 
for the detector experiencing the burst. Thus, if there is 
a burst in detector 1, simply replace Ri with Ri in S to 
obtain Si. Then we find 



Qi -e^Q^ .S Q2 

e2Q2 .s Qi Q2 



and 



(3.42) 



IndetSi ~ IndetRi -f-lndetR2 + tr(Qi • Ri) -I- 0(e'') 

(3.43) 

to first order in Qi and similarly for S2. We also have 



■'12 



Qi 
Q2 



(3.44) 



and 



IndetSi 



In det Ri + In det R2 

+tr(Qi • Ri) + tr(Q2 • R2) (3.45) 



to first order in Qi and Q2. 

We can now compute the locally optimal statistic: 



A 



(2) 



23fi(x^ • Q2 • S • Qi • x) 



1 + a^ 



012 + aia2 



where 



ai 



Pi detRi A ^ r^ ^ 
s— exp(-x • Qi • x) 

1-PidctRi ^^2 1 ' 



(3.46) 



(3.47) 



and a2 is given by a similar expression. Here we have ne- 
glected all Q terms. The terms ai and a2 detect bursts, 
and their role is to suppress A(2) when a burst is present 
in either detector. 



C. Estimators 

In analyzing experimental data, there are different pos- 
sible goals. One goal might be to set an upper limit (with 
a certain statistical confidence) on the stochastic back- 
ground energy density in a particular frequency band. 
Another goal might be to estimate this energy density in 
a particular frequency band. 

For this latter purpose, there are different possible es- 
timators that might be used. One standard estimator is 
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the Maximum Likelihood Estimators (MLE) . In this sec- 
tion, we show how this estimator is related to the cross- 
correlation statistic. 

Recall that the probability distribution for the joint 
detector output is 

lnp(xi,X2|e) — (terms that don't depend on e) 
+e^xj • Q2 • S • Q2 • xi 

+ ie4{tr(Q2.S.Qi.S) 

-xj; • Qi • S • Q2 • S • Qi • xi 

-x^ Q2 • S • Qi • S • Q2 • X2} 

+0{e^). (3.48) 

Suppose we wish to estimate the strength of the 
stochastic background. The Maximum Likelihood Es- 
timator is the value e^LE ^'^^ which this probability is 
maximized: [(ilnp(xi, X2|e)/(ie^](:2 =0. The result is 



where 



Cmle = ^7x^ Q2 • S • Q2 • xi (3.49) 



= -tr(Q2 • S • Qi • S) 

+xj • Qi • S • Q2 • S • Qi • xi 

+x^ Q2 • S • Qi • S • Q2 • X2 (3.50) 

is a measure of how how sensitive the detectors were to 
the stochastic background. Normally 77 will be on the 
order of unity so Emle approximately just the cross- 
correlation statistic. However, if the detector were ab- 
normally noisy, then rj would be less that unity and the 
estimate of the stochastic background strength would be 
smaller than the cross-correlation statistic would indi- 
cate: this is a correction that compensates for artificially 
large values of the cross-correlation statistic due to noise 
fluctuations. 

Another possible estimator is the Bayesian estimator. 
In the long measurement, weak signal case this again 
yields the same result as the MLE estimator. 



IV. IMPLEMENTATION/SIMULATIONS 

A. Implementation 

A nice feature of these techniques is that in practice, 
they should be easy to implement. Work by Scott and 
Whiting |l^ has shown that the PDFs of the Fourier 
amplitudes in different frequency bins can be easily ob- 
tained. Since the characteristic time-scale for stochastic 
background correlation is « 10 ms, these can be com- 
puted using data-segments with lengths seconds or tens 
of seconds. These PDFs can then be used to determine 
where to truncate or clip the correlation, frequency-bin 
by frequency-bin. Provided that the instrument's char- 
acteristics are stable over periods of minutes or hours, it 



should be simple to accumulate sufficient statistics to de- 
termine the PDFs and therefore the truncation or weight- 
ing functions with reasonable accuracy. 

In practice, it may also be desirable to "discard" a 
small part of the "attainable-in-principle" correlation in 
exchange for obtaining more robust statistics. For ex- 
ample, on can arbitrarily zero the 1% of frequency bins 
that are the largest number of standard deviations away 
from the mean value (for that bin). Since the dominant 
contribution in any bin always comes from the detector 
noise, this is only very weakly correlated with the actual 
stochastic background signal, and the net effect is to dis- 
card just a bit more than 1% of the "in principle" attain- 
able signal-to-noise ration. But in exchange, the detec- 
tion statistic becomes far less sensitive to non-Gaussian 
detector fluctuations. The precise effects of such treat- 
ment, and the appropriate truncation thresholds, can be 
easily determined with Monte Carlo simulations using 
simulated signals added into real detector noise. 

In searching for a known waveform (e.g., binary inspi- 
ral) the methods are again easily implementable. Here, 
since the signal timescale is less than a minute, the 
frequency-bin by frequency-bin statistics take a bit more 
time to accumulate, and the detector's statistical proper- 
ties have got to be stable over a slightly longer time-scale 
(an hour, perhaps). This appears likely. 

Since certain non-Gaussian noise features are more 
likely to appear as outlier points in the time-domain, 
and others in the frequency-domain, a combination of 
the time- and frequency-domain methods may be desir- 
able. Unfortunately, if the detector noise is not white, 
this may require the removal (vetoing) of entire small sec- 
tions of time-series data. This is easy in the stochastic 
background case, where only tens of milliseconds around 
a glitch need excision. It may be more problematic for 
signals like binary inspiral chirps that have longer dura- 
tion. 



B. Comparing different statistics 



In Sees. II and III, we derived locally optimal statis- 
tics to search for deterministic and stochastic gravita- 
tional wave signals in the presence of non-Gaussian noise. 
These statistics reduce to the standard matched-flltering 
and cross-correlation statistics when the detector noise is 
Gaussian. But they are more robust (i.e., less sensitive 
to outliers) when the detector noise has non-Gaussian 
components. For both cases, the standard and robust 
statistics take (as input) the output of one or more de- 
tectors, and return (as output) a single real number. But 
the statistics also depend on the gravitational wave sig- 
nal and detector noise models, which are not directly 
observable. Different choices for the signal and noise 
models correspond to different statistics, and these differ- 
ent statistics will in general perform differently given the 
same detector output. In order to compare and evaluate 
the statistics, we need a way to quantify their perfor- 
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mance. 

As mentioned in Sec. II, the quality of a test (i.e., a 
decision rule based on a particular statistic) is character- 
ized by its false alarm and false dismissal probabilities 
for a given source. These are, respectively, the proba- 
bility that the test leads us to conclude that a signal is 
present, when in fact it is absent (e = 0), and the prob- 
ability that the test leads us to conclude that a signal 
is absent, when in fact it is present (e > 0). These two 
probabilities (denoted a and /3c) completely specify the 
long-term performance of a statistic. But to rank dif- 
ferent tests, we need to reduce these multi-dimensional 
error measures to a single figure of merit. How we do this 
depends on the problem we are trying to solve (see, e.g. 

) , but in the context of gravitational wave detection, 
it is common to look for a test that minimizes the false 
dismissal probability, keeping the false alarm probabil- 
ity less than or equal to some maximum tolerable value. 
This criterion is known as the Neyman- Pear son criterion, 
and it was used in Sec. II to define the locally optimal 
statistics. 

Thus, to compare the performance of different statis- 
tics, we should plot false dismissal versus false alarm 
curves for different values of the signal amplitude e. The 
best test (or best statistic) is the one that has the small- 
est false dismissal probability (3^ [a] , for fixed false alarm 
probability a and fixed signal amplitude e. Note that 
since the false dismissal probability depends on both a 
and e, it is possible that the best test for one choice of 
(a, e) is not the best test for a different choice of (a,e). 
Note also that this method of comparing statistics is dif- 
ferent than simply comparing expected signal-to-noise ra- 
tios. What is important when determining error rates 
(and hence the performance of a particular test) is not 
the expected value of the statistic, but rather its proba- 
bility distribution. 

For sufficiently simple statistics with sufficiently sim- 
ple signal and noise models, it may be possible to ana- 
lytically calculate the corresponding false dismissal ver- 
sus false alarm curves. But for most cases of interest, 
we must resort to Monte Carlo simulations to generate 
the curves. This consists of adding simulated signals to 
simulated (or real) detector noise, and then processing 
the resulting data with a statistic. For each stretch of 
data, the statistic outputs a single number which is then 
compared to a threshold to determine if we should claim 
detection. Since we know if a signal is present in the 
data, we can easily determine the fraction of times that 
the decision rule was in error. In the absence of a signal, 
this procedure yields the false alarm probability a as a 
function of the threshold Aq. In the presence of a signal 
having fixed amplitude e, we obtain the false dismissal 
probability /3e, again as a function of the threshold. If we 
invert a(Ao) for Aq = Ao(a), and substitute this expres- 
sion back into /3e(Ao), we obtain the false dismissal versus 
false alarm curve (3^{a). We can then repeat these steps 
for a different signal amplitude e' to produce a new curve 
/3e' (a) . The final result will be a set of curves similar to 



those shown in Fig. ^. 



False Dismissal vs False Aiarm 




False alarm probability: ct 

FIG. 4: False dismissal versus false alarm curves for a typi- 
cal statistic. Lower curves correspond to larger values of the 
signal amplitude e. 



Alternatively, we can plot 1 — a — /3e or e^^(l — a — 
(3f) versus a, as shown in Figs. || and |[ Note that the 
quantity \ — a — [3^ is the difference of two probabilities: 
1 — Pe is the probability that the statistic exceeds some 
threshold in the presence of a signal (e > 0), while a is the 
probability that the statistic exceeds the same threshold 
in the absence of a signal (i.e., e = 0). Although, Figs. ^ 
and ^ contain the same information as the false dismissal 
versus false alarm curves (Fig. plotting e^^(l — a — /3e) 
versus a has the nice property that, for stochastic signals, 
the curves have a well-defined e limit. 



C. Example 

To illustrate how we can compare different statistics 
using Monte Carlo simulations, consider the simple case 
of a search for a white, Gaussian stochastic background 
signal using two independent, identical, coincident and 
coaligned detectors. Statistic 1 will be the standard 
cross-correlation statistic defined by a white, Gaussian 
stochastic background signal and white, Gaussian detec- 
tor noise. Statistic 2 will be a locally optimal statistic, 
also defined by a white, Gaussian stochastic background 
signal, but with white, 2-component, mixture Gaussian 
noise with an arbitrary knee. We will assume that we 
know (a priori) that the two detectors are identical and 
have uncorrelated white noise. We will not assume, how- 
ever, that we know (a priori) the parameters describing 
the statistical properties of the detector noise or the over- 
all amplitude of the stochastic background signal. Each 
statistic will have to internally estimate the parameters 
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FIG. 5: 1 — Q — /3e versus the false alarm probability a for 
a typical statistic. Lower curves correspond to smaller values 
of the signal amplitude e. 
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False alarm probability: a 



FIG. 6: e~^(l - a - (3,) versus the false alarm probability 
a for a typical statistic. Higher curves correspond to smaller 
values of the signal amplitude e. 



from the detector output, without any other prior knowl- 
edge. 

We perform Monte Carlo simulations of the two statis- 
tics for the following three cases: 

(i) uncorrelated, white, Gaussian detector noise with 
zero mean and unit variance. 

(ii) uncorrelated, white, 2-component, mixture Gaus- 
sian detector noise with zero mean, unit variance, 



a /a = 4, and P = 1% [sec Eq. (2.11 



(iii) uncorrelated, white, exponential detector noise 
with zero mean and unit variance [see Eq. ( p.lO| )]. 



The first two simulations test the optimal behavior of the 
statistics. Statistic 1 is designed for the data of case (i), 
and Statistic 2 is designed for the data of case (ii) . The 
third simulation tests the two statistics in a sub-optimal 
situation, representative of a real search where we do not 
know in advance the exact statistical character of the 
detector noise. 

Details of the Monte Carlo simulations are summarized 
below: 

(i) A single stretch of data consists of iV = 1024 
discrete-time samples. This N is sufficiently large that 
the large observation time approximation is valid. Since 
we are considering white noise (which has zero correlation 
length), any iV > 100 would do. 

(ii) The simulated stochastic gravitational- wave signal 
strengths are = .0025, .005, .010, .020, and .040, where 
e is the ratio of the rms amplitude of the stochastic back- 
ground signal to the rms amplitude of the detector noise. 
These signal strengths correspond to signal-to-noise ra- 



tios (~ e y/ N) ranging from approximately 0.1 to 1 for a 
single stretch of data. 

NOTE: Since a real stochastic background is expected 
to have a smaller value of (~ 10~^), we would need a 
much longer observation time to build-up similar signal- 
to-noise ratios in a real search. The purpose of this ex- 
ample, however, is to illustrate how one can compare two 
different statistics; it is not meant to simulate a real (^ 4 
month) stochastic background search. 

(iii) For all three types of simulated detector noise, 
the standard cross-correlation statistic estimates the vari- 
ance of the noise by calculating the sample variance of a 
stretch of detector output equal to lOOA^. Since the de- 
tector output consists in general of signal plus noise, the 
estimate of the noise variance gets worse as the signal 
amplitude increases. The sample variance is needed to 
specify the white, Gaussian noise model that enters the 
defi nitio n of the standard cross-correlation statistic [c.f. 
Eq. E|: 



(i)A 



GCC 



1 

N 



(4.1) 



where cr^ and cri ^re the estimated variances of the noise 
in detectors 1 and 2, respectively. 

(iv) In addition to estimating the variance of the de- 
tector noise, the locally optimal statistic also estimates 
the variances a, a, and breakpoint Xb of the 2-component, 
mixture Gaussian model that define this statistic. It does 
this by fitting two straight lines to a ln(p(x)) vs. x'^ plot 
obtained from a histogram of a stretch of detector output, 
again equal to lOOA^. Best- fit lines at small x and large 
X, respectively, yield estimates of a and a, while the in- 
tersection of the lines yields an estimate of xi,. Actually, 
only the breakpoints for the detector noise are needed to 
define the following locally optimal statistic: 



1 



GCC 



N-1 

N ^ 



xijO {xib - \xi,j\) X 
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X2,jO {x2b - \x2.j\) ,(4.2) 

which is a truncated version of '^^^Agcc- (See the dis- 
cussion of truncation in the previous subsection.) Here 
Q{x) is the usual step function, which equals if a; < 0, 
and equals 1 if a; > 0. 

NOTE: In order to handle pure Gaussian noise (which 
is a pathological case when one tries to model it as a 2- 
component, mixture Gaussian distribution), the locally 
optimal statistic sets the breakpoint Xh to cxd whenever 
the estimated slopes at small and large values of x have 
a percent difference less than 10%. By doing this, the 
locally optimal statistic '^'Agcc effectively reduces to 
the standard cross-correlation statistic Agcc when the 
noise is pure Gaussian. 

(v) We use 10^ trials to generate each false dismissal 
versus false alarm curve. 

(vi) The simulations were written in Matlab [|l^ . 
The results of the simulation are shown in Figs. I?- 11 



False Dismissal vs False Alarm 

(Gaussian detector noise) 
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False aiarm probabiiity: a 



FIG. 7: False dismissal versus false alarm curves for the 
standard cross-correlation and locally optimal statistics for 
simulated white, Gaussian detector noise. The solid lines cor- 
respond to the standard cross-correlation statistic; the dashed 
Unes correspond to the locally optimal statistic. The top curve 
for each statistic has = .0025; increases by a factor of 
2 as one moves to successively lower curves in the graph. As 
explained in the text, the two statistics perform almost iden- 
tically for this case. 



As noted in (iv) above, our implementation of the 
locally optimal statistic reduces to the standard cross- 
correlation statistic when the detector noise is pure Gaus- 
sian. That is why the false dismissal versus false alarm 
curves for the two statistics are effectively identical in 
Fig. 0. 

From Fig. ^ we see that the locally optimal statis- 
tic performs better than the standard cross-correlation 
statistic when the simulated detector noise is mixture 



False Dismissal vs False Alarm 

(mixture Gaussian detector noise) 




Faise aiarm probabiiity: a 

FIG. 8: False dismissal versus false alarm curves for the stan- 
dard cross-correlation and locally optimal statistics for sim- 
ulated white, 2-component, mixture Gaussian detector noise. 
The solid lines correspond to the standard cross-correlation 
statistic; the dashed lines correspond to the locally optimal 
statistic. The top curve for each statistic has — .0025; 
increases by a factor of 2 as one moves to successively lower 
curves in the graph. Since the locally optimally statistic has 
a lower false dismissal probability /3e(a:) for each false alarm 
probability a. and each signal amplitude e, it is clearly the 
better test for this case, as expected. 



Gaussian. For each value of the stochastic signal strength 
and for each false alarm probability a, the false dis- 
missal probability ji^ (a) for the locally optimal statistic is 
less than that for the standard cross-correlation statistic. 
This is as expected, since the locally optimal statistic was 
constructed precisely to handle mixture Gaussian noise. 

Finally, from Figs, g-^ we see that the locally optimal 
statistic also performs better than the standard cross- 
correlation statistic when the simulated detector noise 
has an exponential distribution. The difference in perfor- 
mance between the two statistics for this case is less than 
that for mixture Gaussian noise, but it is still noticeable. 
(Figure |l^ focuses attention on the false dismissal ver- 
sus false alarm curves for small values of the false alarm 
probability, while Fig. |ll]is a plot of e~^(l — a— /?e) versus 
a, which highlights the difference between the two statis- 
tics in the small signal limit.) This behavior is again as 
expected, since a locally optimal statistic is constructed 
to be less sensitive to the tails of a non-Gaussian distri- 
bution. 



V. CONCLUSION 

In this paper, we have constructed a replacement for 
the standard linear matched filter estimators used for 
gravitational wave detection. The replacements are more 
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False Dismissal vs False Alarm 

(exponential detector noise) 




False alarm probability: a 



We have explicitly illustrated the locally optimal de- 
tection strategies for a variety of different noise PDFs, 
and for two different detection problems (single detec- 

Performance of the statistics 

(exponential detector noise) 



FIG. 9: False dismissal versus false alarm curves for the stan- 
dard cross-correlation and locally optimal statistics for simu- 
lated white, exponential detector noise. The solid lines corre- 
spond to the standard cross-correlation statistic; the dashed 
lines correspond to the locally optimal statistic. The top curve 
for each statistic has = .0025; increases by a factor of 2 
as one moves to successively lower curves in the graph. Since 
the locally optimally statistic has a lower false dismissal prob- 
ability /3e (a) for each false alarm probability a and each signal 
amplitude e, it is the better test for this case. 



False Dismissal vs False Alarm 

(exponential detector noise) 




False alarm probability: a 
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FIG. 11: A plot ofe~^(l — a — /3e) versus the false alarm prob- 
ability a for the standard cross-correlation and locally opti- 
mal statistics for simulated white, exponential detector noise 
in the weak signal limit (small e). The top curve (filled circles) 
corresponds to the locally optimal statistic; the lower curve 
(open circles) corresponds to the standard cross-correlation 
statistic. The difference between the performance of the two 
statistics in the small signal limit is more apparent in this plot 
(cf. Fig. Since the locally optimally statistic has a larger 
value of e~^(l — a — (3^) for each false alarm probability a, it 
is the better test for this case. 



tor known waveform, and two-detector stochastic back- 
ground). In all cases, the optimal strategy is similar to 
the one for Gaussian noise except that data samples that 
lie outside the central part of the distribution (the out- 
liers) are excluded from the sums which form the estima- 
tors. 

We believe that for the future generation of sensitive 
gravitational wave detectors, these strategies may be eas- 
ily implemented and offer an improvement on the existing 
matched filter algorithms. 



FIG. 10: A blow-up of the false dismissal versus false alarm 
curves from Fig. ^ for small values of the false alarm proba- 
bility a. 



robust because they are less susceptible to corruption by 
non-Gaussian detector noise. 
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There are other motivations for dropping the remaining 
terms. In particular, the remaining terms are non-zero 
in the absence of correlated stochastic background: the 
GCC term vanishes in this case. One can show that terms 
2, 3, and 4 are all positive deflnite (since {siSj) is posi- 
tive definite) and that for the expected "Gaussian -I- tail" 
detector probability distribution, term 1 is also positive 
definite. Baysian analysis (in part II of this series of pa- 
pers) also shows that the single-detector terms must be 
droppe d. Th is may also be seen in the analysis at the end 
of Sec. [nC . 

hioo is probably very close to 0.65. 

In fact £ should really be interpreted as "coupling to the 
detector" . The locally optimal statistic is the best choice 
in the "weakly coupled detector" limit. 



